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Abstract 

A method is described, which computes from an observed sample of events 
upper limits for production rates of particles, or, in case of appearance of a 
signal, the probability for an upwards fluctuation of the background. For any 
candidate, a weight is defined, and the computation is based on the sum of 
observed weights. Candidates may be distributed over many decay channels 
with different detection efficiencies, physical observables and different or poorly 
known background. Systematic errors with any possible correlations are taken 

into account and they are incorporated into the weight definition. It is 
investigated, under which conditions a Bayesian treatment of systematic errors 
is correct. Some numerical examples are given and compared with the results of 
other methods. Simple approximate formulas for observed and expected 
confidence levels are given for the limiting case of high count rates. A special 
procedure is introduced, which analyses input data in terms of polynomial 
distributions. It extracts confidence levels for a signal or background hypothesis 
on the basis of spectral shapes only, normalizing the total rate to the number of 

observed events. 



1 Introduction 



The operation of high energy accelerators like the LEP storage ring, HERA or 
the Tevatron opened the field of searches for new particles beyond the standard 
model. The search for the last missing standard model particle, the Higgs boson, 
was extended up to a mass of 114 GeV at LEP [8]. The analyses are often quite 
complex, because many physical channels have to be combined and sophisticated 
and efficient event taggers have been developed to find certain event topologies. 
In case of a data excess over expected background, the question comes up imme- 
diately, whether an upwards fluctuation of the background can be ruled out or 
an event excess can be attributed to the particle which is searched for. 

A lot of literature exists which adresses these questions (see the summaries 
given in refs. [1] to [3]. Many publications refer to simple counting experiments. 

In this paper, the method of fractional event counting is described, which uses 
a weighted sum over the observed events as the indicator for a signal. The weights 
(or filter function) are extracted from physical variables of the candidates, and 
they have to be defined to use the experimental information in a statistically 
optimal way. The weight optimization is done without use of observed data 
and is based on Monte Carlo predictions for the signal and background. The 
event weighting allows it to avoid hard cuts in event accpetance which may be 
subjective: precuts can be placed in phase space regions where the weight is very 
small. 

This paper summarises the current status of the method, because it has been 
used in some analyses (refs. [4] to [6]). Recently, the sensitivity of the method 
was improved by incorparating systematic errors in the filter function. In the 
past, systematic errors were finally folded in, but the basic candidate weights 
were defined without it. 

The statistical analysis uses the frequentist approach. Bayesian statistics has 
been applied in a similar way to the multichannel case too [9], and there are even 
comparative results for one physical analysis [6]. 
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2 Specification of the filter function 



2.1 Discriminating variables 

The aim of any statistical analysis of a search experiment is the distinction be- 
tween two physical hypotheses: 

• (A) The data consist of background and the physical signal. 

• (B) The data consist of background only. 

A discriminating variable, £, is introduced to order observed events according to 
their signal likeness. This variable can be the particle mass in the search for a 
resonance, a likelihood constructed from some physical observables or the output 
variable of a neural network. It is assumed that theoretical predictions for the 
spectral distributions of signal and background, s(£) and &(£), exist. 

Data may be available for more than one decay mode of a particle and searches 
may be performed at several acclerator energies by more than one experiment. All 
these results have to be combined. The data will therefore be ordered according 
to search channels. The £ variable will vary from channel to channel. 

All searches are assumed to be statistically independent. It is therefore never 
allowed that the same event appears twice. If an overlap exists, for instance 
between two final states looked at, the two corresponding channels must be rear- 
ranged into three: exclusive selection of events in the two original channels and 
the overlap between the two with a new definition of £. 

In most cases the signal and background spectra of £ will be available in form 
of Monte Carlo histograms s^i = s(6b) and = b(£ki)- Here, the index k is 
used to identify a channel and i indicates the value of its discriminating variable. 
The trivial case of event counting corresponds to the limitation to one histogram 
bin. Throughout this paper it is assumed that histograms are normalized to the 
expected rates, any bin contains the local mean rate. It may be that the total 
signal rate r = J2ki s ki has to be varied during the analysis. For later convienience, 
signal efficiencies per bin can be defined as 

Ski 

tki — 

r 

Branching ratios of decays, channel dependent cross sections and different lumi- 
nosities are incorporated into the definition. 
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If a likelihood or neural network definition does not contain the particle mass 
explicitly, but a reconstructed mass exists and is not correlated strongly to the 
likelihood, the mass and likelihood distributions D(m) and D(L) can be combined 
to define £. This will be described in section 2.3. 

Instead of any £, a monotone function of it can be used as discriminating 
variable too. Apart from binning effects, the final results will be independent of 
such a redefinition. This will be shown in the next subsection. The choice of 
£ is rather arbitrary and has to be based on physical arguments and numerical 
convenience. 



2.2 Event weights 

From and b^i, event weights Wki will be computed. The definition of Wki is 
not unique. Every new filter gives another result for the same experiment, and 
all procedures are correct on statistical average. However, different definitions do 
not have the same performance and the filter should be optimized to get the best 
separation between hypotheses (A) and (B). 

If the wm are known, the total weight of an event sample, often called 'test 
statistics' X, is defined as 



The sum extends over all candidates of an experimental data set or a Gedanken 
experiment. The indices k(l) indicate the channels and are the £ bins to 
which the events belong. 

If an experiment would be repeated many times, the resulting total weights 
show statistical fluctuations. They have to be described with probability den- 
sity functions Pb(X) and P S (,(X). These functions refer to the hypotheses (B) 
(background only) and (A) (the signal exists). They are related to the input his- 
tograms and bki and depend on the filter specification. Implicitly they depend 
on the total signal and background rates. Their computation will be described 
in detail in the next section. 

Confidence levels for a background or a signal plus background compatibility 
of a special data set with the test statistics X = W tot can be computed with 



These definitions are based on the frequentist approach. If a hypothesis is true, 
the median value of the corresponding confidence level is 1/2. A small value of 



x = J2 w Km) 




(1) 
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CL sb indicates a data deficit, if hypothesis (A) is true: CL sb is the frequency of a 
downward fluctuation of X at least to W Q b s , and somewhat unprecisely it is said 
that hypothesis (A) is ruled out with probability 1 — CL sb . Vice versa, if CL b 
is close to 1, a data excess over background is observed which will appear with 
frequency 1 — CL b , if no signal exists. 

It is now straight forward to optimize the definition of w k i in the limit of 
high rates. According to the central limit theorem the functions P sb and P b have 
approximately Gaussian shape: 

rW^X 1 / (X-<X> b )\ 

P, b (X) = -jL- eM J X -<X>^ ) (2) 



The expectation values of X are given by 

< X > b = u; ki b ki < X > sb =< X > s + < X > b = Y w ki (re ki + b ki ) (3) 

k ^% k ^% 

The sums extend over all channels and £ bins. The statistical errors introduce 
the variances 

°l = w tAi ° 2 sb = cr 2 s +(rl = J2 w U re ki + b ki ) (4) 



Criteria for optimal discrimination between hypotheses (A) and (B) are: 



• (i) The mean confidence level for interpretation of an arbitrary test statistics 
X from the background source (B) as signal plus background (A), 

< CL sb > b , should be a minimum. 

• (ii) The mean confidence level for interpretation of an arbitrary test statis- 
tics X from the combined signal and background source (A) as background 
(B), < CL b > sb , should be a maximum. 



The first confidence level is simply the mean probability that an arbitrary Gedanken 
experiment with signal and background events has a total weight less than or 
equal to the weight of an arbitrary experiment counting background. The second 
confidence level is complementary so that both optimization criteria are identical. 

In the high rate limit the probability densities at X = are negligible and 
one gets with equations (1) to (4): 
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1 f°° 

< CL sb > b = / dX ■ exp 

V 2tT(T h J-oo 



(x- < x > b y 



\'2na b J-oo 2cr 6 

.2 



i /- x (y- < j 

, = / dY ■ exp( — 



On the left hand side, the following conventions are introduced: The brackets 
indicate the statistical mean value. Both physical models appear in the equation. 
The events consist of background, which is indicated by the index '6' on the 
left hand side, but they are analysed by the observer in terms of signal and 
background {CL sb ). The double integral can be simplified to 

1 r-<x> s z 2 

< CL sb > b = - / dZ ■ exp(- ) (5) 

^/2n ■ (a 2 sb + 2a 2 b ) J-°° + 

where 

< x >s= J2w ki s ki 

k,i 

is the expectation value of X for signal events. The probability < CL sb > b 
depends on the ratio < X > s / ^ a 2 sb + 2of only which has to be maximized. 
Because a common scale factor in all Wki cancels out in the confidence levels, 
the mean value < X > s can be fixed. The optimization criterion is then, with a 
Lagrangian factor A, 

d(a 2 b + 2a 2 ) > d<X>. _ Q 
dw ki dw ki 

After multiplication with a common constant factor the result becomes simply 

Ski + ^Oki 

The factor 2 appears because the width of the background distribution enters 
twice. The weight for a specific channel and bin is independent of the use of any 
other channel or bin. 

Equations (6) and (1) to (4) are sufficient to compute confidence levels for a 
data set in the high rate approximation. 

The above optimization is not unique. There are other possibilities: 



• (iii) One may look for a bound on a predicted rate r at a requested con- 
fidence level CL. A fixed CL is equivalent to a cut in the signal plus 
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background distribution of X at X cut = r ■ J2k,i e kiWki+ < X > b —K ■ a sb , 
where K is the number of standard deviations equivalent to CL. The prob- 
ability for an upwards fluctuation of the background above the cut X cut 
should be a minimum. In the high rate limit the expected value of this 
probability is 

1 r x <™* (X- < X > h ) 2 

1 - E[CL h \ sb = 1 - / exp(-^ l-T^- (7) 

y2na b J-oo Za b 



Countrary to eq.(5), a sample of signal and background events is analysed 
in terms of background. The confidence level CL b is not averaged over the 
whole signal plus background distribution, it is computed at a fractile of it, 
which is related to K. One gets the condition 

(rT,k,CkiWki ~ Ka sb ) 2 

= max. 



(iv) One can optimize the chance to find a signal, which exceeds the back- 
ground prediction at a requested confidence level CL. This confidence level 
corresponds to a cut in the weight distribution at X cut = J2k,i w kibki + Ka b . 
The maximum chance to detect a signal is obtained by minimizing the 
probability for a downward fluctuation below X cut : 



1 r x ^t (X- < X >*h) 2 
E[CL sb ] b = / exp(- (A < A 2 >sb) (8) 

(EkArekiWki- Ka b ) 2 



(v) The measurement of a hypothetical signal rate is most significant, if the 
ratio 

< X >l la 2 sb 

is maximal. This request corresponds to the special case K = in item 
(iv): the probability that the total weight of signal and background events 
exceeds the median background level is maximized. 



The functional form of w is obtained in the same way as eq.(6). After request- 
ing a fixed sum ^2 k i w^ki to set the w k scale, computation of the derivatives with 
with respect to Wki, absorption of all k and % independent sums into common con- 
stants and a final renormalization one finds in any case the functional form 

ti^ki + Oki 
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This general result contains a free rate parameter R, which has to be tuned to 
fulfill a specific optimization criterion (i) to (v). The confidence levels are invari- 
ant against multiplication of all Wki with a common factor. For definiteness, the 
normalization constant TZ is introduced to adjust the the overall maximum weight 
to 1, but this factor could also be dropped. To garantee a positive denominator 
in any case, R should be positive. 

In general, R is not equal to the signal rate r, but it is proportional to it: 
R = c ■ r. For condition (v) (best rate measurement) one gets R = r. In the 
special case (iii) with K = 0, the observation of a signal at its median value 
and minimum upwards fluctuation of the background, the result is R = 0, which 
means that the weight is proportional to the signal to background ratio. Equation 
(6) is contained as the special case R = l/2r, which is a good compromise. 

If no signal exists at all, an observer will try to find the lowest possible upper 
limit ricL for it. Requesting a definite number of standard deviations K for the 
limit and assuming that only background is observed at its median level, the 
observer has to solve the equation 

n CLtkiW k i - Kcr sb = 

k,i 

The error a s b depends on the expected limit ucl- Derivation of the last equation 
with respect to w^i and setting dncLl 'dwki = gives eq.(9) with R = hcl- This 
is a self consistency relation between the expected rate limit and the parameter 
R, which depends on K. 

Solution (9) depends on the to bki ratio only and is therefore invariant 
against £ transformations, which rescale both distributions with the same 
(£ dependent) factor. 

It was derived in the high rate limit, but can be applied at low rates too. In 
this region it is not expected to be optimal anymore but it is still very close to 
the optimum and it gives still bias free results. Of course, the simple analytic 
formulas for the confidence integrals and the results for the R values given here 
will break down. 

Throughout this paper it is the understanding that the weight algorithm, in- 
cluding the parameter R, is fixed a priory and not fitted to observed data. This 
makes it necessary to generalize the criteria (i) to (v) to non-Gaussian distri- 
butions and to compute the functions P s b, Pb and the expected confidence levels 
< CL s b >b, E[CL s b]b and E[CLb] s b numerically, using theoretical predictions for 
£fci, bki and r. The parameter R has to be varied until a requested optimization 
criterium is fulfilled. 
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This ambiguity in denning the weight function is very confusing. As will 
be shown later, the optimization procedure allows variations of R within rather 
wide regions, if little numerical tolerances of the expected confidence levels are 
accepted. Nevertheless, the result for a specific data set is R dependent. At 
large rates, this effect if often small. However, in low statistics experiments the 
analyses become rather ambigious. On statistical average all results would be 
correct, but one has to select one parameter without introducting subjectivity. 

In many cases the signal to background ratio (R = 0) is the suitable choice. 
This is especially true, if some signal is observed, but no theoretical prediction for 
the cross section exists. An expected signal rate is not needed to define w and the 
function P b can be used to compute the probability for an upwards fluctuation 
of the background to the measured test statistics. 

If a definite signal prediction has to be checked, the value R = r/2 is the 
appropriate choice. 

For the determination of upper bounds the expected limit ucl can be mini- 
mized. An example is given in sect. 4.2. This strategy works, if the background 
is sufficiently large. 



2.3 Two discriminating variables 

In the case of two weakly correlated physical variables like particle mass m and 
likelihood L, a likelihood inspired definition of £, following equation (6), is 

t = D sm {m)D sL (L) 

§ D sm (m)D sL (L) + 2D bm (m)D bL (L) { ' 

where the D's are probility density functions and the indices indicate the physical 
observables and signal (s) or background (b). This procedure has been used in 
Higgs searches of the OPAL collaboration [7]. Equivalently, the £ definition may 
be based onto formula (9). If a larger correlation between m and L exists, it can 
be reduced by a linear transformation in the m — L space before applying (10). 

The common use of eqs.(10) and (6) has the property that the weights Wki 
and the discriminating variable ^ki are identical, if the two physical variables are 
really uncorrelated, i.e. the product ansatz is correct. Any deviation indicates 
the presence of correlations or unacceptably large fluctuations in the Monte Carlo 
samples used to generate the histograms. * 

* Indeed the observation of a few anomalies triggered additional Monte Carlo simulations in 
ref.[7] 
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2.4 Related approaches 



An alternative approach, quite often used, is the ordering of experiments accord- 
ing to the likelihood ratio L^IL^ between the signal plus background and the 
background interpretation of a data set (ref. [13] to [15]). Poisson statistics gives 
for this ratio 

T It I \ nk,i( s ki + bki) ^ ' ^ 

L sb /L b = exp(-r) (M (11) 

llk,i °ki 

where r is the total signal rate and n(k,i) is the number of candidates observed 
in the bin combination (k,i). The likelihood ratio method is equivalent to a 
weighted event counting with the filter function 

w kl = ln(l + ^) (12) 

Oki 

The power expansion in terms of the signal to background ratio is 

b kl 2 61, 3 bl + 

This can be compared with twice the expansion of eq. (6) It turns out that the 
first two terms agree and the difference of the third terms is sf,/ (12 • b\j) only so 
that the results of both methods will be very similar in most cases. 

Significant differences are possible, if one or more candidates are present in 
phase space regions where s k i » b ki . t. 

Because eq.(12) has a singularity, it can produce spurious discoveries, if the 
background distribution has a systematic fluctuation in it, which is not handled 
properly in the statistical analysis. The methods presented here do not check at 
all whether the underlying distributions e^, b ki are consistent with the observed 
pattern. They take the theoretical distributions for shure and ignore the fact 
that in a very low background region the systematic background error may be 
substantial. Contrary to (12) ,eq.(9) approaches a constant event weight in the 
limit bki — > and is thus robust against this kind of effects. 

It is an important advantage of (9) that it can be generalized to incorporate 
systematic errors, which destroy the statistical independency between £ bins, 
assumed in eq.(ll). 

' An effect of this type, introduced by one candidate, is visible in an earlier LEP combination 
of Higgs searches [6] . It had no impact on the final result because the candidate mass lies well 
above the combined mass limit. 



9 



Definition (9) is related to the maximum likelihood fit of the signal rate. The 
logarithmic derivate of the likelihood is 

d In L sb _ J_ dL sb _ X _ 
dr L s b dr r 

with 

Cfc(i) ' r 



which is equivalent to (9) with R = r. The likelihood fit determines r from the 
condition X = r. 



3 Weight distributions 



3.1 Folding procedure 

After the weight function has been specified, density distributions D(£) 

have to be transformed into distributions of w, called P\(w) for one event. The 
symbol D stands for s or b. The histogram conversion is illustrated in fig.l. 
The cumulated integral Jq cu * P(w)dw at a special value w cut is illustrated by the 
shadowed area. In case of histograms, all £ bins with Wk% < w cut have to be 
counted. A cumulated spectrum can be converted into the differential one by 
taking bin-to-bin differences. The central w values of the bins will be assigned 
to all predicted and observed events in that bin. The analytic formula for a 
continuous function is 

™=?if& (13) 

The sum appears because the backward transformation from w to £ is not unique. 
All solutions of the equation w(£) = w contribute. 

Differential histograms Pi(wj) may have many gaps, but these are never pop- 
ulated by Monte Carlo or data events. The extremest case would be a delta 
function at w — 1 for simple event counting. Because the distributions are not 
constant inside the bins, binning effects can finally introduce relative errors of 
the order of l/(< w > -number of bins) in rate limits. 

The distribution of X = Y%=i w k(i)i(i) f° r a fixed number of n events can now 
be computed from the distribution for one event by iterative folding: 



/■min(l,X) 

P n (X) = / P n ^(X - w) ■ P^w) ■ dw (14) 

J max(0,X— {n— l)max(«j)) 
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The integration limits garantee that the arguments can not become negative or 
exceed their upper limits. In general, these equations have no analytic solutions 
and must be evaluated numerically by matrix multiplication. The stepwise evolu- 
tion of P n for a Gaussian signal and constant background is shown in fig. 2. The 
singularity at the upper end of P±, due to the maximum in the £ distribution, 
survives as a step for n = 2 and as a vertical slope at the upper end for n — 3. 
At n = 4 all discontinuities have disappeared. 

If the rates are large, many folding operations are necessary, but the results 
are needed for an n interval only, whose lower and upper bounds n min , n max have 
to be selected to reach a requested accuracy. It is then a faster procedure, to 
double the event numbers in every folding step until the minimal value of n is 
reached, and to keep the distributions for n = 2 m with integer m for subsequent 
use. It is not necessary to compute folding integrals for any n. Distributions 
in the high n region can be computed partly with interpolations because the 
shapes are almost stable. To speed up numerical computations, it is also possible 
to combine two histogram bins into one, if the number of X bins per standard 
deviation exceeds a cut with increasing n. This process can be iterated. 

Finally the Poisson distribution for appearance of n events has to be taken 
into account. If n is the mean rate, the final probability density is 

oo -n 

P(X) = exp(-n) • 8(X) + £ exp(-n) • — • P n (X) (15) 

n>X/max(w) 

For given X > 0, only the terms with n > X/max(u>) can contribute. 

The last formula is used to compute the complete distribution function Pb(X) 
for background events. 

It can be written down for signal events too, and the result P S (X) has to 
be folded with Pb(X) to get the overall distribution for signal and background, 

Psb(X). 

It would be a nasty job to repeat many folding operations, if the signal rate 
r has to be modified iteratively. Therefore, the P n distributions for fixed num- 
bers of signal events, now called P sn , are folded with the complete background 
distribution Pb(X) from (15). To compute confidence levels, only the cumulated 
distributions are needed: 

rX ;-min(Z,n-max(ui)) 

C n (X)= dZ- dY ■ P b (Z -Y)P sn {Y) (16) 

Jo Jo 

The cumulated distribution for the sum of signal and background is then 

,-X _ n=n max n 

CL sh (X)= P sb (Y)dY = exp(-r)-(exp(-5> fc ,)+ ]T - • C n (X)) (17) 
Jo „^ . nl 

k'i n—n mm 
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These results can now be used to compute the expected confidence levels (5), (7) 
and (8), needed to tune the R parameter: 

E[CL sb ] b = CL sb (X cut ) with CL b (X cut )=CL 
E[CL b ] sb = CL b (X cut ) with CL sb (X cut ) = CL 

The parameter CL is the request in criterion (iii) or (iv) and X cut has to be 
computed from it by inversion of (1). 

The expectation values needed for criteria (i) and (ii) are 

[•CO 

< CL sb > b = / CL ab (X)P b (X)dX 

Jo 

< CL b > sb = / CL b (X)P sb (X)dX 

Jo 

As already shown, both expectation values have their optimum at the same value 
of R, which may now be a bit different from r/2. 

An alternative method to compute the series of folding integrals (17) is given 
in ref.[15], where fourier transformation is applied. 



3.2 Some analytic results 

The functions Pi(w) and their statistical moments are known analytically in a 
few cases. All refer to the limit R — 0, which means either a small signal to 
background ratio or the lowest probability for a background fluctuation up to 
the median signal plus background level (criterion (iii)). In any case a constant 
background is assumed. 

• Gaussian signal. 

The w distribution, its mean value and its mean square for one signal event 
are 

p sl ( w ) = ^=L== <W>S =-L < w 2 > s = -i= (is) 

At the signal maximum the weight is set to 1. The background events are 
distributed according to 

P bl (w)=jV y^L= (19) 

w ■ v — In w 

This equation contains a normalization factor j\f and with j\f = 1 it gives the 
total background rate per w interval. The width refers to the signal and 
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dB /d£ is the differential background rate. The expression is not integrable 
at w — 0, because an infinite number of events is taken into account far away 
from the signal. After truncation of the £ spectrum the integral converges. 
The total mean and variance of w are finite even without the cutoff. 

• Breit Wigner resonance over a constant background. 
The convention here is 

Die) ~ 

The distribution, the mean and mean square of w for one signal event are 

n , N 1 1 2 3 

P S1 {W) = < W > s = - < W > s ~- 

7T • Jw ■ (1 — w) 1 

The background distribution is 



P bl (w) =Af ^ 



W • \ w ■ (1 — w) 



• Two-dimensional Gaussian. 

Two independent discriminating variables are distributed according to 
D(e, V ) ~ exp(-(£ - £o)7(2^f)) • exp(-(77 - Vo) 2 / (2a 2 )) . Instead of 
eqs. (18,19) one has 

1 1 

P sl (w) = 1 < w > s = - <w 2 > s =- 

P bl {w)=M —-j^r 

w dtdr] 

From these results one gets the parameters needed for the high rate estimates of 
confidence levels in sect. [2]: 

• Gaussian signal 

v r v fn — dB 2 r dB 

• Breit Wigner signal 

r v dB 2 3 2 1 dB 

<X> s =- 2 <X> b =^— a s = -r <r b = -^ 

• Two-dimensional Gaussian 

r d 2 B r d 2 B 

< X > s = - < X > b = a s = n °l = TT^S" 



2 ° " "dedri s 3 6 ? "dEdrj 
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4 Applications 



4.1 Upper limits without background subtraction 



If nothing is known about size and spectral shape of the background, upper limits 
for a signal rate can be obtained by ignoring the background in (17). The function 
CL sb has to be replaced by 

rX n =nm ax j,n /■min(X,n-max(u;)) 

CL S (X)= P s (Y)dY = exp(-r).(l+ £ -r • / dY ■ P sn (Y)) 

Jo „±~^ . n\ Jo 



rain 



(20) 

Apart from trivial event counting the only meaningful ansatz for the weigths, 
valid for one search channel, is 

Wki = — r (21) 

max(s fc j) 



The upper rate limit is obtained by solving (20) for r, if CL S is given. 

The 95% exclusion limits (CL S = 0.05) for Gaussian and Breit-Wigner 
£ distributions are shown as functions of the test statistics in fig. 3. For com- 
parison, the figure contains some dots marking the 95% confidence limits from 
Poisson statistics without spectral sensitivity. In this case, the abscissa values 
are the observed event numbers. 

Fig. 4 shows a 95% signal exclusion plot, which has been computed from 
three observed events, using their measured masses and varying a hypothetical 
reconance mass. Accidentally, two of the mass values are almost identical. The 
mass resolution is assumed to be Gaussian. 

The results obtained with eq.(20) are given by the solid line. The curve has 
kinks at the rate limit 5.2. This effect is visible in fig.3 too. It is due to the 
singularity of the distribution P\(w) at w — 1 (see fig. 2). At the positions of the 
candidates, the rate limits are slightly worse compared to the Poissonian limits of 
4.74 for one and 6.30 for two observed events. These more pessimistic results from 
fractional counting arise from theoretical configurations with low test statistics, 
which have more events in it than the observed numbers one and two. This is the 
prize one has to pay for mass selectivity. In mass regions away from the observed 
candidates the rate limits from fractional counting are more stringent than the 
Poissonian limits. 

The problem of getting mass selective rate limits without background sub- 
traction has been addressed earlier. Gross and Yepes [10] use fractional event 
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counting too. The weight is defined as the probability that an arbitrary event 
has a larger mass difference with respect to the hypothetical particle than the 
candidate. In the original publication the incorrect assumption had been made 
that the confidence limit for an integer number of fractional counts is equal to 
the Poissonian limit. The exclusions were too stringent. Nevertheless, the ansatz 
for the weight is a legal alternative, and the rate limits obtained with it, using 
the folding procedure (20), are added in fig. 4. The algorithm produces sharp 
spikes at the candidate masses. 

Another formalism to construct confidence levels was given by Grivaz and 
Diberder [11]. They use a formula which looks like the integral (20), truncated 
at the number of observed events: 



Here, the B n 's are probabilities that an arbitrary mass configuration is less likely 
than the configuration of the n measured events closest to the hypothetical mass. 
The B n y s are taken from the x 2 distributions of n masses. The algorithm is 
not equivalent to independent event counting. The authors have shown that the 
expression can not be interpreted directly as a confidence limit. It has some bias, 
which can be corrected for. Numerical results are included in fig. 4. They are 
very similar to those of this work. 

4.2 Upper limits with background subtraction 

If the background is known without any systematic error, a rate limit correspond- 
ing to a confidencve level CL could be determined from the condition 



The r dependence is given by (17), which contains the Poisson distribution. 

As is well known, this procedure becomes problematic, if the observed weight 
sum W bs is less than the expectation from background. Equation (22) may have 
no solution, which means that the frequency of appearance of the observation is 
less than CL for any signal rate r > 0. Mathematically, this is allowed and could 
be due to a statistical fluctuation. The problem may even survive if systematic 
errors are added. 

At this point a subjective element is introduced: To garantee that even in 
exceptional cases the rate limit is conservative, the criterion for its determination 
is often sharpened to [12, 13, 14]: 



E{n obs ) = V exp(-r) ■ — ■ B, 

— ' 77, 



n 



n=0 




(22) 



15 



• The probability to observe a weight sum X less than or equal to the mea- 
sured value W b s , if the background contribution alone is already < W a b s , 
has to be less than CL. 

This ansatz is motivated by the Bayesain treatment of background subtraction 
in counting experiments [12] and it gives an overcoverage by definition. To apply 
this condition, the following equation has to be solved for r: 

cl = cl ^=W£; (23) 

Contrary to condition (22), this equation has a unique solution for any value of 
CL. 

Alternative prodedures have been published which avoid the overcoverage as 
much as possible. The unified approach of Cousins and Feldman gives confidence 
belts instead of one-sided limits and has been applied to the Poisson and the 
Gaussian distribution [16]. At low rates r, the confidence intervals are not central, 
and the upper limits are higher than those computed with (22). The results 
are more stringent than those of (23). Algorithms with optimized coverage for 
the Bayesian procedure have been investigated by Roe and Woodroofe and a 
connection to (23) in the Poisson case has been found [17]. 

The reason for adopting (23) is safetiness of upper limits. If no event is ob- 
served at all, eq.(23) simplifies to CL = exp(— r), and any systematic background 
error cancels out. 

Figure 5 shows the 95% exclusion limits on r (CL = 0.05) for a Gaussian signal 
and constant background. The background level is varied. It is parametrized by 
its mean contribution to the test statistics (3 =< X > b = V2na^^. Asymptotic 
limits can be obtained in the following way: The expected background contribu- 
tion f3 can be subtracted from the observed test statistics X before the rate limit 
is computed with 20. This would shift the result without background subtraction 
by an amount < ^ > = y/2(3. These asymptotic limits are reached for X > (3. 

To get the results in fig. 5, the weight definition (21) was used again, which is 
equivalent to R = in eq.(9) 

According to the previous section, this is not the best choice for the filter. 
Figure 6 shows the optimization of the R parameter for one special background 
level. It is based on the median expected limit E[ng^}i, , the rate r which corre- 
sponds to 1 — CL S = 0.95, if background is observed at its median level (3. Only a 
very weak dependence of E[n 95 ]b on R of the order of a few per mille can be seen. 
The limits from a real observation can vary by several %, however, depending on 
the £ positions of the events, and in general they are a monotone function of R. 
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Fig. 7 gives the median expected limits as a function of (3. The lower curve 
indicates the R parameters used to get these results. At large (3, one has R ~ 
1/2 • E[n^]b. The difference to the above estimate R m E[ng^,]b is due to the 
fact that finally the limit computation is based onto CL S and not onto CL s b. 
Below (3 — 1, the optimization becomes problematic. Poisson fluctuations play 
a significant role. There are several local minima of E[n 95 ]b, if R is varied, and 
no solution has an obvious advantage over the other. The R values given in fig. 7 
have to be considered as upper bounds on R; they are downward extrapolations 
consistent with R = 0.4 • E[ng5]b, which is the unique result around (3 — 2. 

Fig. 8 is completely analog to fig. 5, but now the optimized R values from 
fig. 7 are used in the analysis. It should be noted that the test statistics X is 
not the same in figs. 5 and 8, and in the latter case it is also (3 dependent. The 
dashed curve for (3 = corresponds to the Poisson distribution, because for any 
finite R and (3 — the algorithm does normal event counting. 

For small finite (3 the results depend strongly on R. This ambiguity is illus- 
trated in fig. 9. The example of fig.4 with 3 measured particle masses is analysed 
again. This time it is assumed that a background of 3 events is predicted within 
the mass region of the plot, and this background is subtracted. It corresponds 
to (3 — 0.88. Three exclusion curves are shown, which look completely different, 
but are all legal results. The parameter R = 4 gives an exclusion which looks 
somewhat obscure, but it lies above the bound from fig. 7, which is approximately 
R — 1.6. The second exclusion curve corresponds to this value. The third curve 
for R = corresponds, apart from background subtraction, to the result in fig. 4. 
This weight definition is known to be non-optimal. In spite of this, it is recom- 
mended to keep the maximal mass resolution and to use R = in low statistics 
experiments. 



5 Confidence levels from the shapes of distribu- 
tions 

The algorithms described in sect. [2] do not check the correctness of the £ distri- 
butions. A large value of of a measured test statistics X b s , normally indicating 
a discovery, might also be due to an accumulation of mismodelled low weight 
background events. A statistical test which does not compare the total observed 
rate with a prediction and is sensitive to the local signal to background ratios 
only, can be done in the following way: The probability for an event, observed at 
to be a signal event, is 

Ski 

Pki = —7- 

Ski "T Oki 
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An arbitrary set of n b s events obeys the polynomial distribution. From the 
observed candidates a likelihood 



nobs 

Lpoly = ]J Pk(l)i(l) 
1=1 

can be formed. A confidence level CL po i y can be defined as the probability that an 
arbitrary experiment with the same number of candidates gives at most the likeli- 
hood of the observed configuration. This analysis can be done with a background 
or a signal plus background interpretation of data. Values of CL po i y between 0.16 
and 0.84 indicate consistency with the tested model within 1 standard deviation. 
If CL po iy has a normal value for the background interpretation, but a low value 
for a signal plus background interpretation, a discovery is ruled out, even if CL b 
is close to 1. Vice versa, a large CL po i y for the background interpretation sup- 
ports a discovery. If CL po i y has normal values for both interpretations, the test is 
not conclusive, either because the spectral shapes of signal and background are 
similar or because the expected signal rate is too small. 

To compute the confidence levels, the distribution functions of L po i y are needed. 
The variable L po i y can be replaced by its logarithm. The test corresponds then 
to fractional counting of a fixed number of events with 

w ki = In ■ 



Ski + b 



hi 



The folding procedure is the same as in sect. [2]. The algorithm has the same 
disadvantage as the likelihood ratio method: a singularity, this time at Ski=0. 
To avoid numerical problems, pki has to exceed a minimum value. A continuous 
upwards shift of this cut p cut removes one candidate after the other from the 
sample, until the results are not anymore conclusive. The values of CL po i y jump 
at the discontiuities. 

As a toy example, fig. 10a shows a Gaussian signal peak, a linearly falling 
background and a pattern of candidates. The mean values are 100 (background) 
and 20 (signal), the resolution is 15 bins. Compared to the background model, 
the sample contains too many events (130). The toy experiment is analysed 
at the hypothetical signal position. The normal analysis gives confidence levels 
CL b = 0.993 and CL sb = 0.45, which might be a weak indication for a signal. 
Fig. 10c shows the confidence levels CL po i y as function of p cut . The smooth curves 
are two theoretical predictions: background events at the median level of the 
test statistics are analysed in terms of signal and background (lower curve) and 
signal plus background is investigated assuming all events are background (upper 
curve). The number of accepted events as a function of p cut is given too. The 
falling sensitivity with decreasing number of events is obvious from the pictures. 
Over all, the comparison of the observed CL po \ y distributions with the median 
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expectations shows somewhat better agreement with the background prediction 
and the test does not support a signal interpretation. Probably the background 
is underestimated. 



6 Systematic errors 

6.1 Parametrization of systematic errors 

The treatment here is limited to symmetric systematic errors, described by Gaus- 
sian distributions. As a consequence, confidence levels shifts are proportional to 
the mean squares of errors, if the latter are small. Asymmetric errors modify the 
expectation values < X > b and < X > sb in first order and have larger impacts. 

The errors are classified according to sources j. In principle every source may 
influence the £ spectra of signal and background in all channels. It is parametrized 
by error functions <t^ and crfj.^ whose absolute values are the rms errors, given 
binwise. 

For the technical handling the following rules are introduced: 

• Errors from the same source are treated as fully correlated between different 
bins of a signal or background histogram. The signs of the error functions 
give the signs of the correlations. 

• Errors from the same source are treated as fully correlated between signal 
and background. 

• Errors from the same source are treated as fully correlated between different 
search channels. 

• Errors from different sources are treated as completely uncorrelated. 

• The total relative error is much less than 100%. 

One comment on the independency of error sources is indicated. It could be that 
the spectra Ski and bki are available in an analytic form depending on parame- 
ters with correlated systematic errors. The error matrix can be diagonalized to 
remove the correlations. The assumption of independent sources is therefore no 
limitation. 
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Examples for complete independency are statistical uncertainties of Monte 
Carlo simulations. Considering signal and background, the number of error 
sources is twice the number of channels. 

The last assumption on the error size is somewhat critical. For instance, the 
error due to a mass resolution becomes asymmetric far away from the mass peak 
and it has the same order of magnitude as the spectrum itself. However, it will 
be shown later that bins, where this happens, may be dropped anyway. 

The effect of systematic errors on confidence levels are most easily studied 
with Monte Carlo simulations. To this aim, the input spectra have to be modified 
according to 

*« = s* + E<#iG ( 24 ) 

3 
3 

where the Q are Gaussian random numbers of mean zero and variance one. 

A major problem of eqs.(24) is the fact that error functions corresponding 
to likelihood or neural network variables are not well known, if known at all. 
Usually, systematic errors are evaluated by modifying Monte Carlo simulations 
and counting the rate changes above an effective selection cut. In this situation, 
an additional approximation is needed: 



• For a given channel, the errors have the same dependence on the discrimi- 
nating variable as the signal or background distributions: 

= (25) 

Ab) _ r(6)L 
a j,ki - d jk b ki 

Here, relative errors 5^, are introduced which are source and channel specific, 
but bin independent. In general this ansatz is not true and dictated by lack of 
knowledge. Nevertheless it has been applied in searches (for the Higgs boson, see 
ref . [8] and references therein) . 



6.2 Correction of confidence levels in the frequentist ap- 
proach 

In this subsection it is assumed that the test statistics X is a continuous vari- 
able. The distribution functions P s b(X) and Pb(X) are not allowed to have delta 
function like singularities. 
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In the following considerations, the event source and the analysis hypothesis 
are the same: background events are analysed in terms of background, the anal- 
ogous is done for the combination of signal and background. The indices at the 
functions CL sbl CL b are dropped for simplicity. 

If correct production rates are used and the analysis hypothesis is correct, 
the CL values are uniformly distributed between zero and one. This is not true 
if many potential observers use parameters shifted by systematic errors. The 
distribution functions D(CL rec ) of reconstructed confidence levels CL rec get ob- 
server dependent slopes. Because CL rec has a lower and an upper bound, the 
function D rec (CL rec ), averaged over all observers, peaks at and 1. Without 
any correction observers claim to often data deficits or excesses, as illustrated in 



It is obvious how a correction can be done (see fig. 11): The distribution D rec 
has to be integrated up to the reconstructed confidence level CL obs of a certain 
observer and the integral is the corrected confidence level: 



This procedure has to be applied independently to CL b and CL sb . The problem 
is now that an individual observer can not reconstruct the function D rec , because 
it is based on smearing of the true physical parameters, which are unknown. The 
observer will take his own spectra s(£), &(£) instead of the true ones to evaluate 
the correction of CL obs . This replacement is unavoidable and causes deviations 
of the average CL corr distribution from uniformity. 

In the high rate approximation with Gaussian X distributions, the ansatz (26) 
should reproduce the naive result that statistical and systematic errors have to be 
added in quadrature. This is the case indeed, but the proof needs an additional 
assumption. 

As already mentioned, an observer will start with original signal and back- 
ground spectra Sk%, bki, from which the weights X Q are contructed. The mean 
value and the rms error of X Q are denoted by < X Q > and a Q , respectively. The 
signal and background distributions will be modified according to eqs.(24). The 
new spectra are used to redefine the event weights. The original spectra can be 
inserted into eqs.(3) with the modification that the new filter function is used. 
The mean value of the test statistics < X Q > and its rms error a a will be shifted 
to < X > and a. The complete folding according to sect. 3.1 gives the function 
CL or ig(X), which expresses the original confidence levels as a function of the 
modified test statistics X. The modified spectra have to be analysed too, result- 
ing in the the distributions P rec (X, () with the statistical parameters < X* >, a* 



fig.ll. 




(26) 
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and its integral CL rec (X) = P rec (Y, () ■ dY. For clarity, it is written down 
explicitly that P rec depends on the Monte Carlo variables (. 

The frequency distribution of CL orig is constant by definition, because it 
describes the outcome of repeated measurements analysed with the same weight 
function. Any modified spectrum gives therefore a contribution 



D-reciC L rec )dC L rec — dCL or ig — dX~~ ' 



(27) 



to the integral (26). The CL rec variable in the integration (26) may by replaced 
by X, and a summation has to be performed over all Monte Carlo experiments. 
This leads to the final result 

/oo 
<^C ' Psys(C) ' CL ori (X*) (28) 
-oo 

The integrand contains the parameter X*. It has to be computed from the 
condition 

CL rec = P rec {YX)dY (29) 



which fixes the reconstructed confidence level to CL rec . 

One needs now a relationship between the shifted and original distributions 
of the test statistics, P rec (for ( ^ 0) and P or i g (at ( = 0), and eq. (29) has to 
be solved. Without loss of generality, we restrict ( to one random variable. Both 
distributions are approximated by Gaussians, and the ansatz for its arguments, 
which is the mentioned assumption, is 

X— < X* > _ X Q — < X > a sys 
a* a a 

where a sys is the systematic error of the expectation value of the test statistics. 
The request of constant CL rec , eq.(29), leads to a linear relationship between the 
integration limits X*, X Q and the random variable (. Eq.(28) becomes 

1 r°° „ rX +(-<T 3ys /<T (Y— < X >1 2 

CL corr = J- / d( ■ exp(-C 2 /2) • / dY eM~ * ° J ) 

Z7T J-oo J-oo ACT* 

After a shift of the integration variable Y and inversion of the order of integrations 
one obtains the desired result 

nr 1 l x - , ~(Y- < X >)\ JV 



22 



6.3 Bayesian handling of systematic errors 



Usually systematic errors are treated with the method introduced by Cousins 
and Highland [18]. It is trivial to generalize the Poissonian case, described in the 
original publications, to the situation with event discriminators in many channels. 

As mentioned, an observer does not know the true £ spectra, but only his 
own estimates s^, b^. The set ( of stochastic variables describes now the possible 
variants of the true spectra. Again a function P sys (() is introduced, which is here 
the Bayesian probability that the set ( is the correct one. The reconstructed 
confidence levels depend on (. Now two different statistical methods are mixed: 
To include systematic errors, the confidence levels from the frequentist approach 
are folded with the observers believing about the true spectra: 

/+oo 
CL rec (X, Q ■ P sy s(()d( (30) 
-oo 

Here, X is the measurement of the observer. 

Theoretical spectra enter the analysis twice: they are needed to construct 
the filter function and the absolute rates are used in the statistical analysis of 
sect. 3.1. It is always a matter for debates whether systematic errors should be 
assigned to the weight definition and it became practice to keep this filter 
fixed [6, 8]. The argument for this is that the filter definition is arbitrary and, 
on statistical average, the results are correct for any fixed definition. Within 
the frequentist approach, this argument is not correct for a principle reason: It 
is logically impossible that different potential observers use the same numerical 
parameters for data analysis. Every observer will construct his own filter function, 
and the only agreement which could be reached, is a common value of the ratio 
R/r relevant for eq.(9). However, the comparison of the frequentist and the 
Bayesian approaches is simpler with a fixed weight definition, which is adopted 
in the following. One has therefore X Q = X and < X Q >=< X >. 

Equations (30) and (28) look completely different. This rises the question, 
under which circumstances the Cousins and Highland approach gives a uniform 
distribution of confidence levels. 

A wide class of X densities, for which both approaches agree, can be con- 
structed assuming shape invariance of the X distribution: 

P rec (X, C) = F( X ~ <X * > ) P ong {X) = F{ X ~ <X> ) (31) 
a a 

The denominators are the rms errors of the test statistics and F is a common 
function. A constant reconstructed confidence level means that the ratio 
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(X- < X* >)/o* is the same for C ^ and C = 0: 

X*- < X* > _ X- < X > 

a* a 

* * 

X* = X • — + < X* > - < X > — (32) 
a a 

The ansatz for the systematic error within the frequentist approach is 

!^=<£> + C. /(O-^ (33) 

a* a a 

The C variable has a Gaussian distribution. The arbitrary function / with 
/(0) = 1 describes non-Gaussian systematic errors. Equivalently, for the Bayesian 
treatment the parametrization is 

±^> = <*> +{ . 9(0 .£2: (34) 

a* a a v ' 

A sufficient condition for the equivalence of (30) and (28) is 

P rec (X,-C)=P on (X*) (35) 

for any X and (. The minus sign on the left hand side appears for the following 
reason: The variable ( parametrizes a shift from the original distributions to 
the functions used by an arbitrary observer. In the Bayesian interpretation, the 
direction of the shift has to be inverted. Equations (31), (32), (33) and (34) have 
now to be inserted into (35). Consistency is reached, if and only if a* — a and 
/(C) = <?( — 0- I n general, this simple equivalence proof fails, if one tries to intro- 
duce an X dependence into the functions /, g or to violate the shape invariance 
(31). The origin of the symmetry requirement on /, g is illustrated in figure 12. 

The conclusion is that the equivalence between the frequenstist and the Bayesian 
treatments of systematic errors is partly very general, but there are also limita- 
tions: 



• The distribution of the test statistics may be arbitrary. 

• The distribution of systematic shifts may be an arbitrary function. 

• However, equivalence is garanteed only, if systematic errors shift the X 
distributions, but keep their shapes, and 

• there is invariance of systematic shifts against translations of the test statis- 
tics. 



Apart from exceptions, the Bayesian treatment of systematic errors does not agree 
with the frequentist approach, if one of the last two conditions is not fulfilled. 
Even if both approaches agree, the distribution of confidence levels, averaged over 
many observers, are not nessessarily uniform, as explained in the last subsection. 



24 



6.4 Numerical treatment of systematic errors 



A repetition of the folding operations (14) inside a Monte Carlo loop based onto 
(24) would be a very time consuming analysis. It is a much simpler procedure to 
use the shape invariance (31) and the additivity assumption (33) with g(() = 1. 

The inclusion of systematic errors into the final results is then straightforward: 
With the help of Nmc Monte Carlo experiments (24) and the definitions (3), (4) 
the systematic error is obtained from the mean x 2 

2 _L_ ^ < X* > _ < X > 2 

JVMC MC experiments 

It has to be noted that this expression is written in a form which garantees a 
cancellation of an arbitrary scaling factor in the Wki, and that the expectation 
values involved can be computed without the folding procedure (14). 

Equation (30), together with (31) and 33, leads to a folded distribution, from 
wich the corrected confidence levels can be computed: 

I roo 

P corr {X) = -j=\ d(- exp(-C 2 /2) • P on {X + txsysv) 

V 27T J-oc 

rX 

C Lcorr(X) = / dY ■ P C orr(Y) 

Jo 



The parameter Xsys is different for background and a combination of signal 
and background and it depends on the overall signal to background ratio. If r 
has to be modified to find a rate limit, Xsys has to be reevaluated. 

The procedure has the advantage that it avoids a conceptual problem which 
exists otherwise for the extraction of rate limits from CL S . Without systematic 
errors, CL sb is a monotone function of CL b , if the test statistics is eliminated. 
This function becomes observer dependent in the presence of systematic errors, 
which raises the question how CL S should be defined. In the above approach the 
ratio of folded functions CL^ and CL\, is the natural choice. This method has 
been suggested ealier for counting experiments by Zech [19]. 

One has to keep in mind that the whole procedure is an approximate one and 
can have biases. 



6.5 Poisson distribution at small rates 



The frequentist approach as introduced in sect. 6. 2 can not be applied to the 
Poisson distribution. Here, the Bayesian ansatz has to be taken. The Poisson 
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distribution violates the criterion of shape stability, as introduced in subsection 
6.3. 

This raises the question whether the Bayesian treatment gives a reasonable 
spectrum of reconstructed confidence levels for the Poisson distribution at low 
rates. As an extreme case, which has nevertheless practical relevance for back- 
ground estimates, the problem has been studied for a very small mean rate no = 2 
with a big Gaussian systematic error of 20%. The formalism how to get corrected 
confidence levels is described in ref.[18]. For n observed candidates the result is 

C7L(n)=f>(0 

i=0 

with 

7(0) = exp(-n^+ ^a 2 sys ) 
7(1) = (Ho-oi.) -1(0) 

2 2 

j( n ) = n ° ~ °JM1 . U n - 1) + °JM1 . U n - 2) 
n n 

The following test has been done: A set of potential observers was introduced 
with different assumptions on the mean rate. For any observer a new Poisson 
distribution was generated and for any number of counts n the corrected confi- 
dence levels CL(n) were computed. The entries in the overall CL histogram were 
weighted with the true probability to find n counts. 

Fig. 13 shows the distributions of confidence levels for the special example. 
The differential spectrum of corrected confidence levels is not uniform at high 
CL, it has still a spike at CL = 1. The right column shows the cumulative 
CL distribution in a two-sided logarithmic representation. The result does not 
approach the diagonal at CL = 1. One could argue that the same relative error 
was assumed for all potential observers and that a more adequate choice would 
be 5 ~ 1/y/riQ. Tests have shown that this ansatz gives little improvement but 
does not cure the problem. 

An exceptional case was recently published by Bityukov who investigated 
statistical errors of Monte Carlo simulations as source of systematic errors in 
counting experiments [20] . If the mean rate no is taken from a simulation based 
onto Poisson statistics which has the same mean value as the data sample, the 
effect is absent. The criteria of shape stability of the distribution and translational 
invariance of systematic errors are violated and these effects cancel. 

It is the conclusion that indications for discoveries obtained from low statistics 
samples should be considered with care, if the background has a substantial 
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uncertainty. Even after correction for systematic errors the significance of the 
observation is still overestimated and this bias has to be studied. 



7 Event weighting with systematic errors 

In the preceding section, systematic errors have been added to the final results, 
but the filter function (9) was optimized with respect to statistical errors only. 

If search channels with much different systematic errors are combined or 
an uncertain amount of low weight background events contributes to fractional 
counting, this is not the best way to analyse data: bins with large systematic 
errors have to be downgraded. 

The procedure described in sect. 2. 2 can be generalized to do this. Again the 
limiting case of Gaussian distributions for the test statistics is considered. The 
generalization is straightforward for the three cases R — > 0, R — r/2 and R = r, 
which cover the range of R values in formula (9). 

• (a) R = r/2. 

The optimization criteria (i),(ii) had the following form: The probability 
that an arbitrary measurement of signal and background events gives a 
total weight X less than or equal to the weight of an arbitrary background 
sample, should have a minimum. This condition needs now the supplement: 
The total weights X for the comparison are measured by independent, 
arbitrary observers. 

• (P)R = r. 

This criterion minimized the fluctuation of signal and backgrond down to 
the background expectation. It had the form < X > s /(X sb =max., The 
systematic errors have to be included into the denominator now. 

• (7) R = 0. This criterion minimized the background fluctuation up the 
signal plus background expectation. The ratio < X > s ja^ has to be a 
maximum, with the systematic errors included in cr&. 

Criterion (a) is a bit more complicated than the others and means 
o- 2 /(^w ki s ki ) 2 = (a 2 sb + of) / \J2 w ki s ki) 2 = min. 

ki ki 
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The contributions of systematic errors to the variances have to be computed with 
(3) and (24): 

^-E^-(^ + M + E( a< J >sb ) 2 

ki j a V 

^ 2 = E<-^ + E(^^) 2 

ki j a M 

<^b = E w U s ki + hi) + E(E w ki(<7j% + ^SL)) 2 

ki j ki 

°l = E w li b ki + E(E ^fe^Si) 2 

ki j ki 

The optimization leads to the following equations: 

W ki ■ (Skih + b ki h + b ki k 2 ) +J2 W l™- E( a j?m + (7 S ) m)( (7 S + ^li) * fc l 

im j 

+ E w «™ ' E a t}m°fii ■ h = s ki (36) 

The cases (/3) and (7) are included too: one has k± — k 2 — 1 for condition 
(a),/c2 = for condition (/3) and k\ — for (7). The double sums correct the 
weights (9) for systematic errors, but they contain the final result so that the 
system of linear equations (36) has to be solved. 

Among the weights one may find negative values. Mathematically there is 
nothing wrong with this: The algorithm tries to extract information on back- 
ground from signal tails and to extrapolate this into the signal region to improve 
the accuracy. However, because the errors on the shapes of £ distributions are 
not well known and were even ignored in (25), the appearance of negative weights 
is completely unacceptable. To drop bins with low signal content, equation (36) 
can be supplemented by the request that all Wki should be positive or 0. 

Together with this condition, (36) has a unique solution. 

Let iV be the total number of histogram bins. The normalization condition 
X s = J2ki w kiSki=const. defines an (iV— l)-dimensional hyperplane in the space of 
weights Wki- The N inequalities Wki > define an (N—l) hyperplanar object with 
N corners within this hyperplane, a so called simplex. The simplest examples 
are a connection line for iV = 2, a triangle for N = 3 and a tetraedron for N = 4. 
At the corners only one of the Wki is greater than 0. The surface of the simplex 
consists of N hyperplanar objects of dimension (N — 2), which are simplices again. 
The simplest examples are the end points of the connection line for N = 2, the 
sites of the triangle for N = 3 and the surface triangles of the tetraedron for 
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N = A. These surface elements are characterized by one vanishing wu- Two of 
the (N — 2)-dimensional surface elements have one (N — 3)-dimensional simplex 
in common. There are N • (N — l)/2 of these objects, on which two weights 
vanish. This decomposition can be repeated until one reaches the corners. All 
curvature components on these substructures vanish. 

The condition a 2 /{J2ki w kiSki) 2 — P defines an iV-dimensional hyperellipsoid, 
whose size depends on the constant p. For sufficiently small values all points of the 
simplex w ki > lie outside the hyperellipsoid. Because both the error ellipsoid 
and the simplex are convex and all curvature components of the ellipsoid are 
non-zero, there exists exactly one value of p, for which the simplex becomes a 
tangential object of the hyperellipsoid. The coordinates of the tangential point 
are the weights. 

The point computed with (9) lies in the interior of the N — 1-dimensional 
simplex. In general, the error ellisoid containing it will have a larger value of 
p than this solution. The ansatz (36) leads to a better discrimination between 
hypotheses (A) and (B) with the same optimization criterion, even if less bins 
are used. 

This is illustrated in fig. 14, which shows expected upper rate limits for a 
Gaussian signal arising from a constant background. On the left hand side, the 
original weights (6) are compared with the result of (36). It turns out that 
the region of accepted events around the signal peak is rather narrow, if the 
systematic errors are comparable to the statistical ones. The acceptance window 
depends on the background level /3, which is again the number of events in the £ 
interval \[2txo^. The expected rate limits with the filter (36) (full lines) are lower 
than the limits computed with (9) (dotted lines). It is also evident from the figure 
that the ordering of curves is opposite for the same filters, if the systematic errors 
are not included in the statistical analysis. 

Results of similar quality can be obtained with (9) together with a cut on 
Ski/bki- This would have the consequence that another parameter has to be 
tuned. From a principle point of view, it would be some irony if a cut would be 
introduced here, because fractional counting was partly introduced to avoid hard 
cuts in event acceptance. 

The application of this weighting method is meaningful, if systematic errors, 
including their correlations, have the same the order of magnitude as the statisti- 
cal errors or are even larger. A relevant physical example is the flavor-independent 
search for Higgs bosons [21]. Compared to more specific Higgs searches, the back- 
ground is larger, but systematic uncertainties are very similar. Absolute upper 
limits grow with the square root of background, but the systematic error is pro- 
portional to it, so that systematic errors become important. 
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8 Summary 



The method of fractional event counting has been presented. The statistical 
analysis uses the frequentist approach. A very simple weight function with one 
free parameter was derived and it is described how it can be ajusted to get optimal 
separation of a physical signal from background. It turned out that there is no 
saticfactory optimization strategy for very low statistics experiments and it is 
proposed to use simply the signal-to-background ratio or the signal shape as 
weight. 

Very simple formulas are given to compute expected and observed confidence 
levels in the high rate limit, and for very simple examples like a Gaussian or a 
Breit-Wigner signal over a constant background analytic results are presented. 

A statistical test is suggested as a supplement to normal statistical analyses 
which is is based on polynomial statistics. It is sensitive to the ratio of signal 
and background spectra, but does not use the observed absolute rate for model 
comparisons. 

The frequentist and the Bayesian treatments of systematic errors are com- 
pared for a continuous test statistics, whose distribution has no local delta func- 
tion like spikes. Both approaches agree, if systematic errors introduce shifts of 
the distribution of test statistics without modification of its shape, and if the 
systematic shifts are invariant against translation of the test statistics. 

It has been shown that the Bayesian treatment of systematic errors in low 
statistics experiments is problematic: the results may be biased and this has to 
be studied. 

Finally a method was introduced to reduce the impact of systematic errors 
on confidence levels. It includes systematic errors in the event weights and does 
an automatic bin dropping to tolerate that detailed spectral shapes of systematic 
errors are often not well known. 
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Appendix: Comments on the comparison of two 
arbitrary hypotheses 



In the preceding sections, the physical hypotheses (A) and (B) differed by an 
excess of events in one of them everywhere. Often physical models are parameter 
dependent with locally different signs of cross section shifts. A simple example is 
the comparison of two angular correlations, where the measurement is based on 
a fixed number of events. 

In the following, it is summarized briefly, how the weighting has to be modified 
to handle the more general case. 

Let be and bki the local rates. The previous results are reproduced with 
flfci = + s^. The weight optimization can be repeated with the normalization 
J2wki ■ (ajfcj — bki) =const., and the result is 

U ■ (a ki — bki) 

w ki = 



U ■ a ki + (1 - U) ■ b ki 

with a free parameter U which replaces R. Similarly, U is an arbitrary renormal- 
ization factor. To garantee a positive denominator,?/ should be constrained to 
< U < 1. The weights can now become negative, but they have a lower and an 
upper bound. The folding procedures to get the distributions of the test statistics 
are the same, but X lies now in the interval — oo to oo and the lower integration 
limit in the confidence level integrals has to be set to a sufficiently large negative 
number. 

The statistical test for a fixed number of events, based onto the polynomial 
distribution, can be modified to use the polynomial likelihood ratio of the two 
models as discriminator. The event weights become then 

Wki = m — 

Oki 

This formula is symmetric and has singularities for a k i = and bki = 0; lower and 
upper cuts on the ratios of local rates are needed. An example for its application 
is the mentioned angular distribution check. 

Finally, the weighting with systematic errors leads to the linear equations 
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w k i ■ (a w • ki + b ki ■ k 2 ) + w im ■ a jM a j% ■ k i 

Im j 

Im j 

The numerical factors ki, k 2 are defined as before and depend on the hypothesis 
one likes to verify. 

The requirement of positive weights is meaningless. It was introduced to 
circumvent bad knowledge of the spectral shapes of systematic errors in regions 
where the difference between the models is small. Here, bins with a k i ~ b ki are 
not significant, and they can be dropped with the request that Wki must have the 
same sign as a ki — b k i- 
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Figure 1: Construction of the cumulated weight distribution of signal events from 
their £ distribution and the weight function w(£). 




1 2 3 4 5 

weight sum 

Figure 2: Spectra of the test statistics X for fixed numbers of events. The dis- 
tributions are for small signal to background ratio and a Gaussian signal over a 
constant background. The functions are given for the signal. 
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Figure 3: Count rates excluded with 95% confidence without background subtrac- 
tion, lower curve: Gaussian distribution, upper curve: Breit-Wigner resonance. 
The dots at integer abscissa values are the Poissonian limits from unweighted 
counting. 
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Figure 4: Limits on signal production rates from 3 events without subtraction of 
background. A Gaussian mass spectrum is assumed. The candidate positions are 
given by the points and the mass resolution is indicated by the error bars. Full 
curve: this work, dashed curve: Grivaz and Diberder, dotted curve: weighting of 
Gross and Yepes. 
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Figure 5: Count rates excluded with 95% confidence as function of the weight sum. 
The background is subtracted. The limits are for a Gaussian signal distribution 
and a constant background level j3. The weight is taken proportional to the signal 
to background ratio. 
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Figure 6: Dependence of the median expected 95% confidence limit on the rate 
parameter R. The background level is (3 = 10. 
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Figure 7: Median expected rate limits as a junction of the background level (3, if 
no signal exists. The lower curve gives the parameters R used to compute the 
limits. 
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Figure 8: Count rates excluded with 95% confidence as function of the weight sum. 
The background is subtracted. The limits are for a Gaussian signal distribution 
and constant background levels j3. The R parameters of fig. 6 were used to define 
the weights (see text). 
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Figure 9: Limits on the production rate from 3 observed events with subtraction 
of 3 background events. The data are identical to fig. 4- The curves are for 
different definitions of the weight algorithm and demonstrate the ambiguities in 
the analysis. 
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Figure 10: Confidence levels based on the polynomial distribution for a toy exam- 
ple. Upper left: Signal, background and candidate distributions. Lower left: The 
signal probabilities . Upper right: Confidence levels. The full and the dash- 
dotted curves are for the 'data', the smooth curves are median expectations (see 
text). The analysis assumptions are background for the upper curves, signal and 
background for the lower curves. Lower right: Number of events with > p cu t- 
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Figure 11: Distribution of reconstructed confidence levels for a Poisson distribu- 
tion with a mean rate of 100 events and a systematic error of 10%, as recon- 
structed by many observers. 
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Figure 12: Relationship between the frequentist and the Bayesian treatment of 
systematic errors. Central curve: Original distribution of the test statistics. X is 
a measured value. Right curve: Shifted distribution according to Bayesian error 
treatment for Q < 0. Dark area: Contribution to the corrected confidence level. 
Left curve: Shifted distribution according to the frequentist approach for the same 
value of (. The two horizontally hatched areas are equal by contruction. The 
agreement of both approaches is garanteed if the small marked areas are equal . 
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Figure 13: Spectra of confidence levels for a Poisson distribution with n = 2 and 
a systematic error of 20%, as reconstructed by many observers. Lower (upper) 
part: The confidence levels are corrected (not corrected) for systematic errors. 
Left: Differential distributions. The dots mark the results for the reconstructed 
confidence levels 90%, 99% and 99.9%. Right: Cumulative distributions. The 
step functions show the true original cumulative distribution of CL. 
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Figure 14: Expected upper rate limits for a non- existing Gaussian signal over a 
constant background. Left column: Weight functions. Dashed curves: weighting 
based on statistical errors only. Full curves: systematic errors included in the 
weights. The ordinate scale is arbitrary. Right column: 95% limits as a function 
of the background level. The lower two curves give the limits based on statistical 
errors only. The dotted (dash-dotted) lines correspond to the dashed (full) curves 
on the left hand side. Upper curves: Systematic errors are taken into account. 
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